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Abstract 

For simple, real valued random variables, the expectation is the probability weighted average of the 
values taken on. It may be viewed as the center of mass for the probability mass distribution on the line. 

1 Introduction 

The probability that real random variable X takes a value in a set M of real numbers is interpreted as the 
likelihood that the observed value X (ui) on any trial will lie in M. Historically, this idea of likelihood is rooted 
in the intuitive notion that if the experiment is repeated enough times the probability is approximately the 
fraction of times the value of X will fall in M. Associated with this interpretation is the notion of the average of 
the values taken on. We incorporate the concept of mathematical expectation into the mathematical model 
as an appropriate form of such averages. We begin by studying the mathematical expectation of simple 
random variables, then extend the definition and properties to the general case. In the process, we note the 
relationship of mathematical expectation to the Lebesque integral, which is developed in abstract measure 
theory. Although we do not develop this theory, which lies beyond the scope of this study, identification of 
this relationship provides access to a rich and powerful set of properties which have far reaching consequences 
in both application and theory. 

2 Expectation for simple random variables 

The notion of mathematical expectation is closely related to the idea of a weighted mean, used extensively 
in the handling of numerical data. Consider the arithmetic average x of the following ten numbers: 1, 2, 2, 
2, 4, 5, 5, 8, 8, 8, which is given by 

(1 + 2 + 2 + 2 + 4 + 5 + 5 + 8 + 8 + 8) (1) 

Examination of the ten numbers to be added shows that five distinct values are included. One of the ten, 
or the fraction 1/10 of them, has the value 1, three of the ten, or the fraction 3/10 of them, have the value 
2, 1/10 has the value 4, 2/10 have the value 5, and 3/10 have the value 8. Thus, we could write 

x = (0.1 • 1 + 0.3 • 2 + 0.1 • 4 + 0.2 • 5 + 0.3 • 8) (2) 

♦Version 1.6: Sep 18, 2009 1:17 pm -0500 
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The pattern in this last expression can be stated in words: Multiply each possible value by the fraction 
of the numbers having that value and then sum these products. The fractions are often referred to as the 
relative frequencies. A sum of this sort is known as a weighted average. 

In general, suppose there are n numbers {x\, X2, ■• • x n } to be averaged, with m < n distinct values 
{ti, t2, ■ ■ ■ , t m }. Suppose fj have value tj, {2 have value t2, • • • , f m have value t m . The fj must add to n. 
If we set pi = fi/n, then the fraction p; is called the relative frequency of those numbers in the set which 
have the value ti, 1 < i < m. The average x of the n numbers may be written 



In probability theory, we have a similar averaging process in which the relative frequencies of the various 
possible values of are replaced by the probabilities that those values are observed on any trial. 

Definition. For a simple random variable X with values {ti, t2, ■ • • , t n } and corresponding probabilities 
Pi = P (X = ti), the mathematical expectation, designated E [X], is the probability weighted average of the 
values taken on by X. In symbols 



Note that the expectation is determined by the distribution. Two quite different random variables may have 
the same distribution, hence the same expectation. Traditionally, this average has been called the mean, or 
the mean value, of the random variable X. 

Example 1: Some special cases 

1. Since X = aI E = 0Ie<= + al E , we have E [clIe] = aP (E). 

2. For X a constant c, X — cIq, so that E [c] = cP (f2) = c. 

3. If X = Y^i=i ti^Ai then aX = Y^i=i a ^i^Ai, so that 




(3) 



n 



n 




(4) 



n 



n 




(5) 
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E[X] = sum of moments = center of mass 



Figure 1: Moment of a probability distribution about the origin. 



Mechanical interpretation 

In order to aid in visualizing an essentially abstract system, we have employed the notion of probability 
as mass. The distribution induced by a real random variable on the line is visualized as a unit of probability 
mass actually distributed along the line. We utilize the mass distribution to give an important and helpful 
mechanical interpretation of the expectation or mean value. In Example 6 1 in "Mathematical Expectation: 
General Random Variables", we give an alternate interpretation in terms of mean-square estimation. 

Suppose the random variable X has values {ti : 1 < i < n}, with P (X = tj) = Pi- This produces a 
probability mass distribution, as shown in Figure 1, with point mass concentration in the amount of p; at 
the point t;. The expectation is 

^JiPi (6) 

i 

Now \ti\ is the distance of point mass p; from the origin, with p; to the left of the origin iff t; is negative. 
Mechanically, the sum of the products tiPi is the moment of the probability mass distribution about the 
origin on the real line. From physical theory, this moment is known to be the same as the product of the 
total mass times the number which locates the center of mass. Since the total mass is one, the mean value 
is the location of the center of mass. If the real line is viewed as a stiff, weightless rod with point mass p; 
attached at each value t; of X, then the mean value nx is the point of balance. Often there are symmetries 
in the distribution which make it possible to determine the expectation without detailed calculation. 

Example 2: The number of spots on a die 

Let X be the number of spots which turn up on a throw of a simple six-sided die. We suppose each 
number is equally likely. Thus the values are the integers one through six, and each probability is 

1 "Mathematical Expectation; General Random Variables", Example 6: Alternate interpretation of the mean value 
<http://cnx.Org/content/m23412/latest/#fs-id7202349> 
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1/6. By definition 

E[X] = \-l+ l --2+ l --Z+ l --A+ l --b+ l --&= l -{l + 2 + Z + A+b + &)= 1 - (7) 
6666666 2 

Although the calculation is very simple in this case, it is really not necessary. The probability 
distribution places equal mass at each of the integer values one through six. The center of mass is 
at the midpoint. 

Example 3: A simple choice 

A child is told she may have one of four toys. The prices are $2.50. $3.00, $2.00, and $3.50, 
respectively. She choses one, with respective probabilities 0.2, 0.3, 0.2, and 0.3 of choosing the first, 
second, third or fourth. What is the expected cost of her selection? 

E [X] = 2.00 • 0.2 + 2.50 • 0.2 + 3.00 • 0.3 + 3.50 • 0.3 = 2.85 (8) 

For a simple random variable, the mathematical expectation is determined as the dot product of the value 
matrix with the probability matrix. This is easily calculated using MATLAB. 

Example 4: MATLAB calculation for Example 3 

X = [2 2.5 3 3.5] ; '/. Matrix of values (ordered) 
PX = 0.1* [2 2 3 3]; '/. Matrix of probabilities 
EX = dot(X,PX) '/. The usual MATLAB operation 

EX = 2.8500 

Ex = sum(X.*PX) 7. An alternate calculation 

Ex = 2.8500 

ex = X+PX' 7. Another alternate 

ex = 2.8500 

Expectation and primitive form 

The definition and treatment above assumes X is in canonical form, in which case 

n n 

X = ^JiI Ai , where A t = {X = ij, implies E [X] = ^J t P (Ai) (9) 

i=l i=l 

We wish to ease this restriction to canonical form. 

Suppose simple random variable X is in a primitive form 

m 

X = 'y~]cjlc :i ) where {Cj : 1 < j < m} is a partition (10) 

We show that 

m 

E[X]=J2c j P(C j ) (11) 

Before a formal verification, we begin with an example which exhibits the essential pattern. Establishing 
the general case is simply a matter of appropriate use of notation. 

Example 5: Simple random variable X in primitive form 

X = I Cl + 2I C2 + Ic 3 + 3/c 4 + 2/c 5 + 2/c 6 , with {C u C 2 , C 3 , C 4 , C 6 . C 6 } a partition (12) 
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Inspection shows the distinct possible values of X to be 1, 2, or 3. Also, 

A 1 = {X = 1} = d \/C 3 , A 2 = {X = 2} - C 2 \JC^\JC 6 and A 3 = {X = 3} = C 4 (13) 
so that 

P(A 1 ) = P(C 1 ) + P(C 3 ), P(A 2 ) = P(C 2 ) + P(C 5 ) + P(C 6 ), and P(A 3 )=P(C 4 ) (14) 

Now 

E[X] = P (A,) + 2P (A 2 ) + 3P (A 3 ) = P (d) + P (C 3 ) + 2 [P (C 2 ) + P (C 5 ) + P (C 6 )] + 3P (C 4 ) (15) 

= P (Ci) + 2P (C 2 ) + P (C 3 ) + 3P (C 4 ) + 2P (C 5 ) + 2P (C 6 ) (16) 

To establish the general pattern, consider X = Y^j=i c j^c r We identify the distinct set of values contained 
in the set {cj : 1 < j < to}. Suppose these are t\ < t 2 < ■ ■ ■ < t n . For any value t; in the range, identify the 
index set J; of those j such that Cj — ti. Then the terms 

^2cjI Cj = U^2l Cj = UIa„ where A i= \JCj (17) 

Ji J% jeJi 



By the additivity of probability 



P(A l ) = P{X = U) = Y J P(C 3 ) (18) 

jeJi 



Since for each j <G Ji we have Cj = ti, we have 

n n n m 

e [x] = YtiP (Ai) - E p Vi) -EE c i p Vj) - E c ^ p (^) ( 19 ) 

2—1 i— 1 z=l jGJi j — 1 

— □ 

Thus, the defining expression for expectation thus holds for X in a primitive form. 

An alternate approach to obtaining the expectation from a primitive form is to use the csort operation 
to determine the distribution of X from the coefficients and probabilities of the primitive form. 

Example 6: Alternate determinations of E [X] 

Suppose X in a primitive form is 

X = I Cl + 2I C2 + I Ca + 3/c 4 + 2/ Cs + 2I C(i + I Cr + 37 Cs + 2I C9 + I Cl0 (20) 
with respective probabilities 

P(Ci) = 0.08, 0.11, 0.06, 0.13, 0.05, 0.08, 0.12, 0.07, 0.14, 0.16 (21) 



c = [1 2 1 3 2 2 1 3 2 1] ; '/. Matrix of coefficients 

pc = 0.01* [8 11 6 13 5 8 12 7 14 16]; % Matrix of probabilities 
EX = c*pc' 

EX = 1.7800 '/. Direct solution 

[X,PX] = csort(c,pc); Determination of dbn for X 

disp([X;PX] ') 

1.0000 0.4200 
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2.0000 0.3800 

3.0000 0.2000 
Ex = X*PX' E[X] from distribution 

Ex = 1.7800 

Linearity 

The result on primitive forms may be used to establish the linearity of mathematical expectation for 
simple random variables. Because of its fundamental importance, we work through the verification in some 
detail. 

Suppose X = Y^i=i ti^Ai an d Y — Y^jLi u j^Bj (both in canonical form). Since 

n m 

E 7 ^ = E 7 ^ = 1 ( 22 ) 

i=l j=l 

we have 

n I m \ m / n \ n m 

x+y = j2ui Ai E 7 ^ + E"' 7 ", E 7 ^* = EE (** + ^) 7 ^ (23) 
i=\ \j=i j j=i \i=i / »=ij=i 

Note that IaJ-b a — ^AiBj an d ^i-E>j = {X = ti,Y = Uj}. The class of these sets for all possible pairs (i, j) 
forms a partition. Thus, the last summation expresses Z = X + Y in a primitive form. Because of the result 
on primitive forms, above, we have 

n m n m n m 

e[x+y} = & + u j) P ^ B i) = EE*' P ^ B i) + EE^ P ^ B o) (24) 

i=l j=l i—lj — l i—lj — l 

n m m n 

= E^E P ^ B i) + E^E P (^^) (2 5 ) 

z=l j—1 j — 1 i—1 

We note that for each i and for each j 

m n 

P(A i ) = J2P(A i B j ) and P (Bj) = ^P (A^j) (26) 

i=l »=1 

Hence, we may write 

n m 

E[X + Y} = Y^UP (Ai) + Y, Uj P (Bj) = E[X] + E [Y] (27) 
»=i j=i 

Now oX and bY are simple if X and Y are, so that with the aid of Example 1 (Some special cases) we have 

E [oX + bY] = E [aX] + E [bY] = aE [X] + bE [Y] (28) 
If X, Y, Z are simple, then so are aX + bY, and cZ. It follows that 

E [aX + bY + cZ] = E [aX + bY] + cE [Z] = aE [X] + bE [Y] + cE [Z] (29) 

By an inductive argument, this pattern may be extended to a linear combination of any finite number of 
simple random variables. Thus we may assert 

Linearity. The expectation of a linear combination of a finite number of simple random variables is that 
linear combination of the expectations of the individual random variables. 

— □ 

Expectation of a simple random variable in afflne form 
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As a direct consequence of linearity, whenever simple random variable X is in affine form, then 



E[X]=E 



»=i 



c + J2 c * p ( 3 °) 



Thus, the defining expression holds for any affine combination of indicator functions, whether in canonical 
form or not. 

Example 7: Binomial distribution (n, p) 

This random variable appears as the number of successes in n Bernoulli trials with probability p 
of success on each component trial. It is naturally expressed in affine form 

n n 

X = so that E [X] = Jj9 = np (31) 

i=l »=1 

Alternately, in canonical form 

n 

X = J2kI Akn , with Pk = P(A kn ) = P(X = k) = C(n, k)p k q n -\ g=l -p (32) 

fc=0 

so that 

n 

E{X]=Y, kC ( n > k)p k q n - k , q=l- P (33) 

fc=0 

Some algebraic tricks may be used to show that the second form sums to np, but there is no need 
of that. The computation for the affine form is much simpler. 

Example 8: Expected winnings 

A bettor places three bets at $2.00 each. The first bet pays $10.00 with probability 0.15, the 
second pays $8.00 with probability 0.20, and the third pays $20.00 with probability 0.10. What is 
the expected gain? 
SOLUTION 

The net gain may be expressed 

X = 101 A + 81 B + 20I C - 6, with P (A) = 0.15, P (B) = 0.20, P (C) = 0.10 (34) 

Then 

E [X] = 10 • 0.15 + 8 • 0.20 + 20 • 0.10 - 6 = -0.90 (35) 
These calculations may be done in MATLAB as follows: 
c = [10 8 20 -6] ; 

p = [0.15 0.20 0.10 1.00]; '/. Constant a = al_ (Omega), with P (Omega) = 1 

E = c*p' 

E = -0.9000 

Functions of simple random variables 

If X is in a primitive form (including canonical form) and g is a real function defined on the range of X, 
then 

m 

Z = g (X) = V^<7 (c?) Icj a primitive form (36) 
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SO that 



E[Z] = E[g(X)] = ^g(c j )P(C j ) 
i=i 

Alternately, we may use csort to determine the distribution for Z and work with that distribution. 
Caution. If X is in affine form (but not a primitive form) 



(37) 



so that 



X = c + ^CjI Ej then g (X) ^ g (c ) + ^g (cj) I Ej 

3 = 1 3 = 1 



E[g(X)}^g(c Q )+J2g(c j )P(E j ) 

3 = 1 



Example 9: Expectation of a function of X 

Suppose X in a primitive form is 



(38) 



(39) 



X 



-3/ Cl - Ic 2 + 2I C3 - 37 C4 + 47c 5 - Ic B + Ic 7 + 2/c 8 + 3/ C9 + 2I Cl0 



with probabilities P (C\) = 0.08, 0.11, 0.06, 0.13, 0.05, 0.08, 0.12, 0.07, 0.14, 0.16. 
Let g (t) = t 2 + 2t. Determine E [g (X)]. 



c = [-3-12-34-11232]; 
pc = 0.01* [8 11 6 13 5 8 12 7 14 16]; 
G = c.~2 + 2*c 

G = 3 -1 8 3 24 -1 3 8 15 
EG = G*pc' 
EG = 6.4200 
[Z,PZ] = csort(G,pc) ; 
disp([Z;PZ] ') 

-1.0000 0.1900 
3.0000 0.3300 
8.0000 0.2900 

15.0000 0.1400 

24.0000 0.0500 
EZ = Z*PZ' 
EZ = 6.4200 



7, Original coefficients 
7, Probabilities for C_j 
7. g(c_j) 
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7, Direct computation 

7. Distribution for Z = g(X) 
7o Optional display 



7o E[Z] from distribution for Z 



(40) 



A similar approach can be made to a function of a pair of simple random variables, provided the joint 
distribution is available. Suppose X = Y^i=i ^i^Ai an d Y — Y^JLi u j^Bj (both in canonical form). Then 

n m 

Z = g(X,Y) = «j) !A iBi (41) 

i=l.;=l 

The AiBj form a partition, so Z is in a primitive form. We have the same two alternative possibilities: (1) 
direct calculation from values of g(ti, uj) and corresponding probabilities P(AiBj) = P (X = ti, Y = Uj), 
or (2) use of csort to obtain the distribution for Z. 

Example 10: Expectation for Z = g (X, Y) 

We use the joint distribution in file jdemol.m and let g (t, u) = t 2 + 2tu — 3u. To set up for 
calculations, we use jcalc. 
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7. file jdemol.m 



X = 


[- 


2.37 - 


1.93 


-0 


47 - 


-0.11 





0. 


57 1. 


22 2 


.15 


2.97 


Y = 


[- 


3.06 - 


1.44 


-1 


21 


0.07 


0. 


38 1 


.77 2 


.01 


2 . 84] ; 


P = 





0001* [ 


53 


8 


167 


170 


184 


18 


67 


122 


18 


12; 








11 


13 


143 


221 


241 


153 


87 


125 


122 


185; 








165 


129 


226 


185 


89 


215 


40 


77 


93 


187; 








165 


163 


205 


64 


60 


66 


118 


239 


67 


201; 








227 


2 


128 


12 


238 


106 


218 


120 


222 


30; 








93 


93 


22 


179 


175 


186 


221 


65 


129 


4; 








126 


16 


159 


80 


183 


116 


15 


22 


113 


167; 








198 


101 


101 


154 


158 


58 


220 


230 


228 


211] 



jdemol 7. Call for data 

jcalc 7. Set up 

Enter JOINT PROBABILITIES (as on the plane) P 
Enter row matrix of VALUES of X X 
Enter row matrix of VALUES of Y Y 

Use array operations on matrices X, Y, PX, PY, t, u, and P 
G = t.~2 + 2*t.*u - 3*u; 7. Calculation of matrix of [g(t_i, u 
EG = total (G.+P) 7» Direct calculation of expectation 

EG = 3.2529 

[Z,PZ] = csort(G,P); 7» Determination of distribution for 

EZ = Z*PZ' 7. E[Z] from distribution 

EZ = 3.2529 
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